The only categorical feature (in this table) "family" has 33 possible unique values (not so a lot)
That means we can easily use one-hot-encoding during model training

Actually, these plots don't really help us, because, looking on them, we can only catch some common "trends", which describe global changes in some products sales.
For example, we can be sure that on the 1th January every year number of sales is equal to 0 (actually, it is necessary to check, but I think it is obvious).
Also we can say that the number of sales in general has a positive dynamic (for example, 'automotive', 'bread/bakery', 'grocery i', 'personal care', ...).
Some of the product types have a negative dynamic (such as lingerie).
Some goods have very interesting sales distribution (books, produce, froxen foods, ladieswear, ...), and we can't say right now, what is the reason of that.
Also some goods families have seasonal increase in sales('school and office supplies', 'liquor, wine, beer', 'grocery ii', 'frozen foods').

These plots show that, roughly speaking, sales distributions of all families divided on two parts:
1) Normal or close to normal (such as 'automotive', 'bread/bakery', 'cleaning', 'eggs', 'grocery', 'lingerie', ...). Interesting fact that most of distributions from this category have right asymmetry (asymmetry coefficient is positive). The prove is below.
2) Distribution, where the biggest density is concentrated in zero or near zero. Other data is distributed differently (some values such as 'home and kitchen i', 'ladieswear' have something like normal distributions). It means that such goods categories aren't essential for people, that is why a number of sales during the day mostly is equal to 0.

It becomes understandable that promotions have a pretty good influence on sales INCREASING (in general, talking about all the data).
As we can see on the plots above, most of the 'family' values (but not all of them!) prove this fact. Interesting fact that 'books' didn't have any promotions during he whole period of observations.
Moreover, correlation is influenced by outliers, so this coefficient may not be accurate.
Nevertheless, 'onpromotion' feature is useful for the predictions.

Store id doesn't help itself with the sales predictions, so it is necessary to replace the store id with the corresponding information about it

It seems that the presence of both features 'state' and 'city' isn't necessary, that is why we should delete one of them.
I'll choose 'state' feature, because 'city' feature gives us more information (there can be few cities in the state).
'city' feature has only 7 more values than 'state' feature, that is why speaking about the model complexity, there should not be much difference.

As we can see, the store type has a great impact on the target (type D is leader and it is logical, because the plot above shows us, that type D stores are the most), that is why 'type' feature is very important.

'cluster' feature doesn't correlate with the target, and I can't see any dependecies between these 2 features. But I think that this feature can be useful, because it connects similar stores together.